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Parallel  Scheduling  Algorithms* 
Uliezer  Dekel  and  Sartaj  Sahni 
University  of  Minnesota 


Abstract: 

/Ve  obtain  fast  parallel  algorithms  for  several  scheduling 
problems.  Some  of  the  problems  considered  are:  scheduling 
to  minimize  tne  number  of  tardy  jobs;  job  sequencing  with 
deadlines;  scheduling  to  minimize  earliness  and  tardiness 
penalties;  channel  assignment;  and  minimizing  the  mean  fin- 
isn  time.  The  shared  memory  model  of  parallel  computers  is 
used. 

X 

Keywords  and  Phrases :  parallel  algorithm,  complexity, 
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L*  Introduction 

Witn  the  continuing  dramatic  decline  in  the  cost  of 
Hardware/  it  is  becoming  feasible  to  economically  build  com¬ 
puters  with  thousands  of  processors.  In  fact,  Batcher  [3] 
describes  a  computer  (MPP)  with  16,334  processors  that  is 
currently  being  built  for  NASA.  In  coming  years,  one  can 
expect  to  see  computers  with  a  hundred  thousand  or  even  a 
million  processing  elements.  This  expectation  ha3  motivated 
tne  study  of  parallel  algorithms. 

Since  the  complexity  of  a  parallel  algorithm  depends 
very  much  on  the  architecture  of  the  parallel  computer  on 
which  it  is  run,  it  is  necessary  to  keep  the  architecture  in 
mind  when  designing  the  algorithm.  Several  parallel  archi¬ 
tectures  have  been  proposed  and  studied.  In  this  paper,  we 
shall  deal  directly  only  with  the  single  instruction  stream, 
multiple  data  stream  (SIMD)  model.  Our  techniques  and  algo¬ 
rithms  readily  adapt  to  the  other  models  (eg:  multiple 
instruction  stream  multiple  data  stream  (MIMD)  and  data  flow 
models).  SIMD  computers  have  the  following  characteristics: 

(1)  They  consist  of  p  processing  elements  (PEs).  The  P£s 
are  indexed  <3,1,  ...,  p-1  and  an  individual  PE  may  be 
referenced  as  in  P£(i).  Each  PE  is  capable  of  perform¬ 
ing  the  standard  arithmetic  and  logical  operations.  In 
addition,  each  PE  knows  its  index. 

(2)  Each  PE  has  some  local  memory. 

(3)  The  PEs  are  synchronized  and  operate  under  the  control 
of  a  single  instruction  stream. 

(4)  An  enable/disable  mask  can  be  used  to  select  a  subset 
of  the  PEs  that  are  to  perform  an  instruction.  Only  the 
enabled  PEs  will  perform  the  instruction.  Tne  remain¬ 
ing  PEs  will  be  idle.  All  enabled  PEs  execute  the  same 
instruction.  The  set  of  enabled  PEs  can  change  from 
instruction  to  instruction. 


While  several  SIMD  models  have  been  proposed,  in  this 
paper  we  snail  deal  explicitly  with  only  the  shared  memory 
model  ( SMM) *  In  this  model,  there  is  a  large  common  memory 
that  is  shared  by  all  the  PEs.  It  is  assumed  that  any  PE 
can  access  any  word  of  this  common  memory  in  0(1)  time. 
When  two  or  more  PEs  access  the  same  word  simultaneously,  we 
snail  say  that  a  conflict  has  occured.  If  all  tne  PEs  (at 
lea3t  two)  that  simultaneously  access  tne  same  word  wisn  to 
write  in  it,  it  is  called  a  write  conflict  *  If  all  wish  to 
read,  then  it  is  a  read  conflict"!  Write  conflicts  may  be 
permitted  so  long  as  all  the  PEs  wish  to  write  tne  same 
piece  of  information.  As  far  as  our  discussion  here  is  con¬ 
cerned,  no  read  or  write  conflicts  are  allowed.  A 
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description  of  some  of  tne  otner  SIMD  models  can  be  found  in 
151- 


Most  algorithmic  studies  of  parallel  computation  have 
oeen  based  on  the  SMM  (1/2,4,7,10,11,19,21,22).  Parallel 
matrix  and  graph  algorithms  for  the  SMM  have  bean  developed 
by  Agerwala  and  Lint  [1],  Arjomandi  [2],  Csanky  [4],  Eck¬ 
stein  [7],  Hirschberg,  Chandra,  and  Sarwate  [10],  and  Savage 
[22].  Hirschberg  [11],  Muller  and  Preparata  [19],  and 
Preparata  [21]  have  considered  tne  sorting  problem  for  SMMs . 
Tne  results  of  Muller  and  Preparata  [19]  and  Preparata  [21] 
will  be  made  use  of  in  this  paper.  In  these  two  papers,  i£ 
is  shown  that  n  numbers  can  be  ported  in  O(logn)  time  if  n 
PEs  are  available,  and  in  O(log  n)  time  when  n  PEs  are 
available  - 

Dekel  and  Sanni  [6]  develop  a  design  technique  for 
parallel  algorithms  that  is  based  on  binary  computation 
trees.  This  design  technique  is  illustrated  using  several 
examples  from  scheduling  tneory.  Some  of  tne  scheduling 
problems  considered  by  them  are: 

PI;  Schedule  many  machines  to  minimize  maximum  lateness 
when  all  jobs  nave  a  processing  time  p.»l. 

P2 :  Scnedula  one  macnine  to  minimize  maximum  lateness. 

Preemptions  are  permitted. 

P3:  Schedule  one  machine  to  minimize  the  number  of  tardy 

jobs . 

P4 :  The  job  sequencing  with  deadlines  problem. 


The  complexity  of  £heir  parallel  algorithms  for  all  the 
above  problems  is  O(log  n). 

When  measuring  the  effectiveness  of  a  parallel  algo¬ 
rithm,  one  needs  to  consider  both  its  complexity  as  well  as 
its  cost  in  terms  of  the  number  of  PEs  used.  The  effective¬ 
ness  of  processor  utilization  (EPU)  is  defined  with  respect 
to  a  parallel  algorithm  and  the  fastest  known  sequential 
(i.e.,  single  processor)  algorithm  for  the  same  problem. 
Let  P  be  a  problem  and  A  a  parallel  algorithm  for  P.  We 
define : 

EPU(P, A)= 

complexity  of  the  fastest  sequential  algorithm  for  P 
number  of  PEs  used  by  A  *  complexity  of  A 

Tne  algorithm  of  [6]  for  problem  PI  aoove  uses  n/2  PEs 
and  nas  a  complexity  of  Q(log  n).  The  fastest  sequential 
algoritnm  known  for  this  problem  is  due  to  Horn  [4]  and  runs 
in  0(n  logn)  time.  So,  the-EPU  of  tne  parallel  algoritnm  of 
[6]  for  PI  is  0( niogn/ { nlog  n) )  *  O(l/logn). 
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The  best  EPU  one  can  hope  for  is  0(1).  Few  parallel 
algorithms  achieve  this  EPU.  Dekel  and  Sahni  [6]  present 
some  algorithms  that  do.  One  that  we  snail  need  here  is  for 
the  partial  sums  proolem.  We  are  given  n  numbers 

al'a2'*‘*'an  and  are  re3u^re(3  to  compute  ®  a^,  l_<j_<n, 

where  $  is  any  asscoiative  operator  (eg.  max7^min,  +  ,  *). 
Their  algorithm  run  in  O(logn)  time  and  uses  n/logn  PEs . 

In  this  paper,  we  consider  several  scheduling  problems. 
Fast  parallel  algorithms  are  obtained  for  each.  In  each 
case,  the  complexity  analysis  is  carried  out  on  the  assump¬ 
tion  tnat  as  many  PEs  as  needed  are  available.  This  is  in 
conformance  witn  tne  assumption  made  in  almost  all  the 
research  work  done  on  parallel  computing .  This  assumption 
is  of  course  unrealistic.  A  parallel  algorithm  will  eventu¬ 
ally  be  run  on  a  machine  with  a  finite  number  (say  k)  of 
PEs.  It  should  be  easy  to  see  that  all  our  algorithms  are 
easily  adapted  to  tne  case  of  k  PEs.  If  our  algorithm  has 
complexity  0(g(n))  using  f(n)  PEs,  then  with  k  PEs,  k  < 
f(n),  its  complexity  is  0{ g( n) f ( n) /k) .  We  shall  continue 
with  tradition,  and  explicitly  analyse  our  algorithms  only 
for  the  case  when  as  many  PEs  as  needed  are  available. 

In  Sections  2  and  3,  we  consider  two  relatively  simple 
examples.  The  first  of  these  is  to  minimize  the  finish  time 
when  m  identical  machines  are  available.  The  second  example 
is  to  minimize  the  mean  finish  time  when  m  uniform  machines 
are  available.  In  Sections  4,  5,  6,  and  7,  we  respectively, 
consider  the  following  problems: 

(i)  minimize  the  number  of  tardy  jobs  when  Pi=l 
l_<i_<n  and  1  machine  is  available. 

(ii)  job  sequencing  with  deadlines, 

(iii)  schedule  one  maheine  to  minimize  the  maximum 
earliness  and  tardiness  penalties. 

and 

(iv)  channel  assignment. 


2.  Minimum  Finish  Time 


When  preemptions  are  permitted,  a  minimum  finish  time 
schedule  for  m  machines  is  efficiently  obtained  using  Me 
Naughton’s  rule  [2.7].  Let  De  the  processing 

times  of  the  n  jobs.  The  finish  2ime,  f,  of  an  optimal 
preemptive  schedule  is  given  by: 

1  n 

f  *  max {  max  {p. >  p. } 
l<i<n  1  1=1  x 


Using  f,  tne  optimal  schedule  may  oe  constructed  in 
0(n)  time  [17].  Job  1  is  scheduled  on  machine  1  from  d  to 
P1  and  job  2  from  p^  to  min( p^+p2 , f } .  If  p  +  p2  >  f,  then 


-  O  - 


the  remainder  of  job  2  is  done  on  macnine  2  starting  at  time 
3 .  If  _< f ,  tnen  job  3  is  scheduled  on  macnine  1  from 
p^+?2  to  min[p^  +  +  p^/f}?  etc. 


may 


Using  cne  parallel  algorithms  of  io J,  max(p^) 
be  computed  in  O(logn)  time  witn  a/logn  PSs . 


n 

and  I  p ^ 
To  oitain 


tne  actual  schedule,  we  also 

tioned  in  Section  1,  all  the 
time  using  n/logn  PEs  (C6J). 
determine  its  own  processing 
ing  rule: 


need  A. =  >  p.,  l<i<n.  As  mea¬ 
ns^  ^ 

A^s  canJoe  computed  in  O(logn) 
Let  A^=0.  Eacn  job  i  can  now 
assignment  by  using  the  follow- 


x  <-  r aa  ,/fi  *  f  -  a1 

case 


:  x=0  : 

schedule  job  i 

on  machine 

r  A./ri 

fAj/fl 

from  J  to  p . 

:x>p ■ : 

"  i 

schedule  job  i 
f-x  to  f-x+p^ 

on  machine 

from  1 

:  else : 

end  case 

schedule  job  i 
Pi-x 

on  machine 

rA./n 

from  0  to 

One  may 

verify  that  x 

gives  the 

amount 

of  processing 

:ime  left 
.sned  on  tha 

on  tne  machine 
t  machine. 

after  job  i-1  is  fin- 

Example  _2.1_:  Suppose  we  nave 
as  given  in  Figure  2.1*  Let 
are  2.1  gives  the  A^  and  x 
schedule  ootained  is  given  in 


14  jobs  with  processing  times 
m=5 .  f=max ( 7 ,  50/5} =13.  Fig- 
values  for  each  job.  The 
Figure  2 . 2  .  [  ] 
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If  we  have  n  PE3 ,  all  the  machine  assignments  can  be 
computed  in  3(1)  time.  however,  using  only  n/logn  PEs, 
tnesa  assignments  may  be  ootained  in  O(logn)  time  (i.e., 
eacn  PE  computes  at  most  TlognH  assignments).  So,  the 
overall  scneduling  algorithm  has  a  complexity  of  O(logn)  and 
uses  n/logn  PEs.  So,  its  EPU  is  0( n, ( logn*n.  logn) ) *0( 1 ) . 
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.3  .  Minimum  Mean  Finish  Time 

A  non-preemptive  schedule  that  minimizes  the  mean  finish 
time  of  n  jobs  on  m  identical  machines  is  obtained  by  using 
the  LPT  rule-  3y  simply  using  a  parallel  sorting  algorithm, 
tnis  schedule  may  be  obtained  in  O(logn)  time  witn  n  PEs  or 
in  O(log  n)  time  with  n  PEs - 

Let  us  consider  the  case  of  m  uniform  parallel 
machines.  Associated  with  machine  i  is  a  speed  s..  It 
takes  machine  i ,  p^/s^  time  units  to  complete  the  processing 
of  job  i.  Horowitz  and  Sahni  [13]  present  an  O(nlogmn) 
algorithm  that  constructs  a  minimum  mean  finish  time 
schedule  for  this  case*  Their  algorithm  is  reproduced  in 
Figure  3.1.  This  algorithm  assumes  that  the  speeds  and  pro¬ 
cessing  times  have  been  normalized  and  sorted  such  that 
Si=l<s2  <  .  .  .  <_sm  and  Pl<P2<.  .  .<Pn. 

3y  examining  this  algoritnm,  we  see  that  another  way  to 
obtain  an  optimal  schedule  is  to  sort  the  mn  numbers  i/s., 
l<_i<_n,  l^jfm  into  nondecreasing  order.  Let  the  resulting 
sequence  be  a,,  a^,  a~,  .  a  .  If  a^  corresponds  to 
q/s.f  then  job  n+l-i  is  scheduled  onmachine1 j  and  there  are 
q-l^jobs  following  it  on  that  machine. 
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Thi3  information  may  be  obtained  in  Q(log  mn)  time 
using  a  parallel  sort  and  mn  PSs  or  in  O(logmn)  time  with  mZ 
n  PEs.  If  we  use  the  former  sort  algor jtnm,  the  EPU  of  our 
parallel  algorithm  is  0( nlogmn/ (mn* log  mn) ) =0( 1/ (mlogmn) ) . 
If  the  latter  3ort  algorithm  is  used,  the  of  oar 

scheduling  algoritnm  becomes  O(nlogmn/ (m  n^logmn) )  = 
0(l/(;n  n))*  The  actual  start  and  finisn  times  for  each  job 
can  be  obtained  by  later  using  the  partial  sums  algorithm  of 

C6]. 


Algorithm  MFT 


Input:  m  processors  with  speeds  1,  s^,  .  s^, 

l<_s2<_.  .  .  <s  ;  n  jobs  initially  sorted  so  tnat 

£>liP2— #  Pn  wriere  Lne  Pj_  are  for  pro¬ 

cessor  1  . 

Output : Sets  R.  l£i<m.  Tne  jobs  in  R.  are  to  be  run 

on  processor  i  in  increasing  1order  of  tneir 
execution  times. 
for  j  <—  1  to  m  -  1  do 
R;  +-  <t>'  <—  1/s  . 

end  for  J  J 

R  <—  In}  ;  i  <—  2/s 

/^Note  that  ^he  above111  assigns  the  joo  with  tne 
largest  processing  time  to  the  fastest  processor, 
m.// 

for  k  < —  n  -  1  to  1  do 

Let  u  be  the  largest  index  such  that  i  =  min  {i.} 

u  .  j 

i  <—  i  +  1/  s 
end  foi:  U 


end  MFT 


Figure  3.1 


4.  Humber  of  Tardy  Jobs 


Let  J=  {  (  p  .  ,  d  .  )  |  1  <_i£n}  define  a  set  of  n  jobs  p.  is  the  pro¬ 
cessing  time  of  job  i  and  d.  is  its  due  time.1  Let  3  be  any 
one  machine  schedule  for  J.  1Job  i  is  tardy  in  the  schedule 
S  iff  it  completes  after  its  due  time  d . * 


Hodgson  and  Moore  [13]  have  developed  an  O(nlogn) 
sequential  algorithm  that  obtains  a  schedule  tnat  minimizes 
cne  number  of  tardy  jobs.  DeKel  and  Sahni  [6J  present  an 
O(log  n)  parallel  algorithm  to  obtain  a  schedule  with  the 
fesvast  number  of  tardy  jobs.  This  algorithm  uses  0(n)  PEs 
and  has  an  EPU  of  O(l/logn). 

In  this  section,  we  shall  develop  a  parallel  algorithm 
for  the  case  when  p  =l,  l<_i£n.  This  algorithm  will  have  a 
complexity  O(logn).  will  require  0(n  j  PEs  and  thus  nave 
an  EPU  that  is  0(l/n).  /fhile  the  algorithm  of  this  section 
has  an  EPU  that  is  inferior  to  that  of  [6],  it  is  faster  by 
a  logn  factor.  It  is  interesting  to  note  that  the  simplifi¬ 
cation  p^-l,  does  not  lead  to  a  corresponding  speed  up 
for  tne  sequential  case. 


The  problem  of  finding  a  schedule  tnat  minimizes  tne 
number  of  tardy  jobs  is  equivalent  to  that  of  selecting  a 
maximum  cardinality  subset  U  of  J  such  that  every  job  in  U 
can  oe  completed  by  it3  due  time.  Jobs  not  in  U  can  be 
scheduled  after  those  in  (J  and  will  oe  tardy.  A  set  of  jobs 
U  such  tnat  every  job  in  U  can  be  scheduled  to  complete  by 
its  due  time  is  called  a  feasible  set .  It  is  well  known 
that  a  set  of  jobs  U  is  feasible"  iff  scheduling  jobs  in  U  in 
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nondecraasing  order  of  due  times  results  in  no  tardy  jobs 
( see  [14]  for  eg .  )  . 

When  p ■ -1 ,  l£i<_n,  a  maximum  cardinality  feasible  set  U 
can  be  obtained  by  considering  the  jobs  in  nondecreasing 
order  of  due  times.  Tne  job  j  currently  being  considered 
can  be  added  to  U  iff  I U 1 <d .  .  Procedure  FEAS(J,b)  is  a 
slignt  generalization.  It  finds^a  maximum  subset  of  J  that 
can  be  scheduled  in  the  interval  [0,b].  DONE(i)  is  set  to  -1 
if  job  i  is  not  selected  and  is  set  to  a  number  greater  than 
0  otherwise.  If  DQNE(i)  >  3,  than  job  i  is  to  be  scheduled 
from  D0N2(i)  -  1  to  DONE(i).  The  procedure  itself  returns  a 
value  that  equals  the  number  of  jobs  selected.  The  correct¬ 
ness  of  FEAS  is  easily  established  using  an  exchange  argu¬ 
ment.  Its  complexity  is  O(nlogn)  as  it  takes  this  much  time 
to  order  the  jobs  by  due  time. 

line  Procedure  FEA3(J,n,b) 

/"/select  a  maximum  number  of  jobs  for  processing// 
//in  [3/ b]  n= I J I / / 

1  set  J;  integer  n,b?  global  DONE(l:n) 

2  sort  J  into  nondecreasing  order  of  due  times 

3  DONE ( 1 : n )  <—  -1  //initialize// 

1  j  3 

5  for  i  1  to  n  do 

6  case 

7  :j>j3:  return(  j )  //interval  full// 

3  : j  <d^ :  //select  iff  j  j+1?  DONE(i)<— j 

9  end  case 

10  end  for 

11  return { j ) 

12  end  FEAS 


Figure  4 


Let  J  be  a  set  of  n  unit  processing  time  jobs.  Let 
D(i),  l£i_<k  be  the  distinct  due  times  of  the  jobs  in  J. 
Assume  that  D(i)  <  D(i+1),  l£i£k.  Let  n(i)  be  the  number  of 
jobs  in  J  with  due  time  D(i),  l<_i<_k.  Clearly,  £  n(i)=n. 
Let  D (0)=0  and  n(0)a#.  Define  F(i)  to  be  the  value  of  j 
when  procedure  FEAS  (Figure  4.1)  has  just  finished  consider¬ 
ing  all  jobs  in  J  witn  due  time  at  most  D. .  It  is  evident 
that:  1 


F(0)  =  D(0)  -  0 

(4.1) 

F(i)  -  min{F( i-L )+n( i) , D( i) , b} ,  l<i<K 

Expanding  the  recurrence  (4.1),  we  ootain: 

F ( 1 )  *  min{ D( 3 )+N( 1 ) ,  D(l),  b} 

F ( 2 )  a  min{F(l)+n(2) ,  D(2),  b] 

a  mia(D(3)+n(l)+n(2) , D(l)+n(2) fb+n(2) ,Q{2) ,bj 
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=  min(D(<3)+n(l)+n(2)  ,D(l)+n(2)  ,D(2)  ,bj 

F(  3 )  = 

min t  D( 3 ) +n( 1 ) +n( 2 ) +n( 3 ) , D( 1 ) +n ( 2 ) +n( 3 ) , D( 2 ) +n( 3 ) , D( 3 ) , d } 


And,  in  general 

m 

(4.2)  F(m)  =  min{  min  {D(i)+  >  n(q)},b}  0 <_m<_k 

l£i<_m  q=i+L 

The  maximum  number  of  joos  in  J  tnat  can  be  scneduled 
in  [0,b],  o>3,  so  tnat  none  is  tardy  is  F(k).  F(k)  may  be 

efficiently  computed,  in  parallel  as  follows.  Let  tne  due 
times  of  the  n  jobs  in  J  be  d(l),  d ( 2 d ( n) .  Let 
d(J)=0.  We  may  assume  that  d(i)>0,  l_<i_<n.  computation 

steps  are: 

sort  d(lsn)  into  nondecreasing  order, 
determine  the  points  r (0),  .  ..,  r(k~l)  in  d(0:n) 

where  the  due  times  change  I.e.  r(i)  < 
r(i+l),l^i<k  and  d(r(i))  ^  d(r(i)+l).  Let 

r(k)=n.  Clearly,  r(0)=0,  and  n( i)=r ( i) -r ( i-1 ) 
and  D(i)  =  d(r^i)),  lj<i<_*;  D(0)=a. 

since  D(i)  +  >  n(q)  =  D ( i ) +n-r ( i ) , we 

compute  F(k)=fhin{n+  min  { D  (  i  ) -r  (  i )  }  ,  b}  [] 

0  <^i<_k 

Example  4 . 1_  Figure  4.2(a)  gives  the  due  times  of  a  sat  J  of 
15  jobs.  in  Figure  4.2(d),  the  jobs  have  been  ordered  by 
due  times.  The  points  at  whicn  the  due  times  change  are 
shown  by  heavy  lines.  We  see  that  k=6  ;  r(0:6)  =  (0,  3,  7, 
8,  9,  13,  15)  and  D(0:6)=(0,  2,  3,  5,  8,  9,  11).  So, 


n+  min  {D(i)-r( 
J<i<K 

i)  } 

=  15 

+  min{J,  -1, 

-4,  -3, 

-1.  -4,  -4}  = 

15-4=11.  If  b 

j  OD3  IS  11.  [  ] 

then 

the  maximum 

number 

of  nontardy 

2 

With  n  PEs,  step  1  can  be  carried  out  in  O(logn)  time, 
(see  [19]  and  [21  j).  Using  n-1  PEs,  the  boundary  points  can 
be  found  in  0(1)  time.  PS(i)  simply  checks  to  see  if 
d(i)<d(i+i),  l£i<jn-l •  if  so,  then  i  is  a  boundary  point,  d 
and  n  are  also  boundary  points.  The  boundary  points  have 
now  to  be  moved  into  memory  positions  r ( 0 ), r ( 1 ),..., r ( k ) . 
Tnis  can  be  done  in  O(logn)  time  using  n  PEs  and  the  data 
concentration  algorithm  of  [20].  Another  data  concentra tion 
step  moves  d(r(0)),  d(r(l)),  ...,  d(r(k))  into  D(0),  D(l), 
...,  D(k).  Using  k+1  PEs,  D(i)-r(i),  0<_i<_k  can  be  computed 
in  0(1)  time.  min{D( i) -r ( i) }  can  be  obtained  in  O(logx) 
time  using  the  binary  tree  computation  model  of  [6]  (Figure 
4.3  shows  this  for  our  example.)  As  explained  in  [6],  only 
j(k/logK)  PEs  are  needed  for  this;  but  using  k/2  PEs  is 


this  is  adequate.  To  obtain  the  actual  schedule,  we  may 
proceed  as  follows.  First,  modify  procedure  FEAS  by  adding 
the  line: 

3.1  : el se :  DONE ( i )  <—  j 

and  by  deleting  line  7. 

It  is  easy  to  see  that  job  i  is  completed  at  time 
DONE  (  i  )  iff  DONE  (  i )  <Jd  and  DONE(i-l)  £  DONE  (  i  )  ,  l<i<n.  For 
the  modified  algorithm,  we  see  that: 

DONE ( 0  )  =  J 

(4.3)  DONE(i)  =  min  {DONE  (  i-1  )+l  ,  d  .}  ,  l<_i<_n 
Solving  (4.3),  we  obtain: 

(4.4)  DONE  ( i )  =  min  {d.+i-j},  l<_i<_n 

0<J  <_i  J 

2 

DONE(i),  ^<i<r\  may  be  computed  in  O(logn)  time  using  n 
PEs  ( tnough  a/loqn  are  sufficient)  and  tne  binary  computa¬ 
tion  tree  model  (sea  [6]  and  Figure  4.3).  2S:*-nce  the  initial 
sort  takes  O(logn)  time  and  requires  a  PES,  the  overall 
time  complexity  is  O(logn)  and  the  EPU  is  0(l/n).  From 
DONE(i),  the  schedule  is  easily  obtained. 

Example  4.2:  For  the  sorted  data  of  Example  4.1,  we  obtain 
DONE  ( 0 : 1 5 )  =  ( 0 ,  1,  2,  2,  3,  3,  3,  3,  4,  5,  6,  7,  3,  9,  10, 
11).  So  the  set  of  non  tardy  jobs  in  [0,b],  b>_l  1  is  {2,  4, 
1,  3,  11,  5,  8,  12,  14,  7,  9}.  By  concentrating  these  to 

the  left,  we  obtain  the  permutation  (2,  4,  1,  3,  11,  5,  3, 

12,  14,  7,  9,  15,  6,  10,  13)  which  represents  an  optimal 

schedule.  [ ] 


5^  Job  Sequencing  With  Deadlines 

In  tnis  problem,  we  are  given  a  sat  J  of  n  joos.  Associated 
with  job  i  is  a  profit  and  a  due  time  d^,  l<_i<_n.  Every 
joo  ha3  a  processing  requirement  of  one  unit.  If  job  i  is 
completed  by  time  d.,  then  a  profit  z  ,z.>0  is  made.  If 
joo  i  is  not  completediby  the  time  d . ]  tAen  nothing  is 
earned.  We  wisn  co  select  a  feasible  subset  of  J  that 
yields  maximum  return  (recall  thac  R  is  a  feasible  subset 
iff  all  jobs  in  R  can  be  scheduled  to  complete  on  time) . 

One  way  to  find  a  feasible  subset  R  of  J  tnat  gives 
maximum  return  is: 
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Step  Is  sort  J  into  nonincreasing  order  of 
Step  2  s  R  4—  { 1 } 

for  i  <—  2  to  n  do 

if  R  U { i }  is  feasible  tnen  R  <—  RU(i} 
end  for 


Figure  5^.1 


A  correctness  proof  of  the  above  procedure  may  be  found 
in  [14].  It  is  also  possible  to  implement  the  above  scheme 
by  a  sequential  algorithm  of  complexity  O(nlogn) .  For  the 
parallel  version,  we  reduce  the  job  sequencing  with  dead¬ 
lines  problem  into  2n  independent  feasibility  problems. 
First,  we  note  that  if  R1  and  R2  are  feasible  subsets  of  J 
and  if  R1  is  one  with  maximum  return,  then  |R2|<_|Rl|. 

Theorem  5^.1^:  Let  A  be  a  feasible  subset  of  J  that  yields 
maximum  return.  Let  B  be  any  feasible  subset  of  J. 

I  B  I  <_  i  A  |  . 

Proof ;  Since  A  and  B  are  feasible  subsets,  they  can  respec¬ 
tively  be  scheduled  in  [3,|A|]  and  [3,  l 3 l ]  in  such  a  manner 
that  no  job  is  tardy.  Consider  such  a  scheduling  SA  of  A 
and  S3  of  3.  Consider  a  job  i  that  is  in  both  A  and  B.  If  i 
is  scheduled  earlier  in  SA  than  in  SB,  we  may  change  SA  by 
moving  i  to  the  slot  it  is  scheduled  in  B.  This  would 
require  moving  the  job  (if  any)  scheduled  in  this  slot  in  SA 
to  the  position  previously  occupied  by  i  (see  Figure 
5.2(a)).  A  similar  transformation  may  be  made  if  i  is 
scheduled  later  in  SA  than  in  SB  (see  Figure  5.2(b)). 

By  performing  the  above  transformation  on  all  jobs  in  A 
n  B,  we  obtain  schedules  SA'  and  S3'  that  contain  no  tardy 
jobs.  In  addition,  jobs  in  A  n  B  are  scheduled  in  the  same 
slots  in  SA  *  and  SB'  . 

If  1 3 1  >  I A | ,  then  there  must  be  job  j  scheduled  in  SB’ 
in  a  slot  that  is  empty  in  SA* .  Also,  j  ^  A.  By  adding  j 
to  A,  we  clearly  obtain  a  feasible  set  with  return  larger 
than  that  obtained  from  A.  This  contradicts  the  assumption 
on  A.  So,  | B| <J A| .  [ ] 

From  the  sequential  algorithm  for  the  job  sequencing 
problem  and  Theorem  5.1,  we  may  derive  a  parallel  algorithm. 
Let  Tl(i)  =  [ j I z  .  >  z.  or  (z.  »  z.  and  j  <  i)}  and  T2(i)  = 
Tl(i)  U  (i}.  -'Consider  a^scheaule  for  Tl(i)  that  has  the 
fewest  number  of  tardy  jobs.  Let  x(i)  be  the  number  of  non¬ 
tardy  jobs  in  this  schedule.  Let  y(i)  be  tne  corresponding 
number  for  T2(i).  From  our  discussions,  it  follows  that  job 
r  will  be  included  in  R  (Figure  5.1)  iff  y(i)  >  x(i). 
Hence,  R  may  be  obtained  by  computing  x(i)  and  y(i),  l£i<_n. 
x(i)  and  y(i)  may  be  computed  using  the  parallel  algorithm 
for  F(k)  described  in  Section  4.  From  R,  the  optimal 


•n  0 


- 
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Figure  _5..5  me  optimal  schedule 

As  far  a3  the  complexity  is  concerned,  the  initial  sort  by 
due  times  can  be  done  in  O(logn)  time  using  a  PEs.  Mext, 
we  need  to  replicate  this  sorted  data  into  n  copies,  one  to 
be  used  for  eacn  £l(i)  and  T2(i).  This  replication  can  oe 
carried  out  using  a  PEs  and  spending  O(logn)  time  (the 
O^logn)  time  is  needed  to  avoid  read  conflicts).  Sow,  tne 
n  PEs  are  divided  into  n  groups  of  n  PEs  each.  Group  i 
computes  Tl(i)  and  then  T2(i).  Tl(i)  is  obtained  oy  naving 
the  jth  PS  in  group  i  flag  job  j  iff  z.  >  2.  or  (2.  =  z .  and 
j<i).  Mext,  the  flagged  jobs  are  concentrated  O(iogn) 
time  using  tne  n  PEs  in  each  group.  Mote  tnat  tnis  concen¬ 
tration  preserves  the  due  time  ordering.  The  n  PEs  in  group 
i  next  compute  x(i)  «  F(*.),  l_^i<_n.  This  takes  O(logn) 
time.  y(i),  l<i_<n  is  computed  in  a  manner  similar  to  that 
used  to  obtain  xTi)* 

Having  obtained  x(i)  and  y(i),  n  PEs  are  used  to  deter¬ 
mine  if  y(i)  >  x(i),  l^i<_n.  The  selected  jobs  can  be  con¬ 
centrated  in  O(logn)  time  using  these  n  PEs.  The  concentra¬ 
tion  preserves  the  due  time  ordering  of  the  selected  jobs. 

The  overall  complexity  of  our  parallel  algorithm  is 
therefore  O(logn).  It  uses  n  PEs  and  has  an  EPU  of  0(l/n). 
This  should  be  contrasted  with  the  algoritnm  presented  by  us 
in  [6]  for  tne  same  problem.  That  algorithm  has  a  complex¬ 
ity  of  O(log  n)  but  uses  only  Q(n)  PEs.  Thus,  its  EPU  is 
0(1/ logn) . 


6.  Earliness  and  Tardiness  Penalties 


Let  J  oe  a  set  of  n  jobs.  Associated  witn  each  job  is  a  tar¬ 
get  start  time  a^ ,  a  target  due  time  b.,  and  a  processing 
time  0^ .  Any  one  machine  schedule  S  for  J  may  be  denoted  by 
a  vector  (s,,s2,...,s  )  where  is  the  start  time  of  job  i. 
Schedule  S  Is  admissaole  iff  s^_^  +  Pi-1'  The 

computation  time  c.  of  job  i  is  s.  +  p..  The  earliness  e, 
and"  tardiness  t^  of  ^00  i  are  given^by: 

e.  a  max{0, a.-s.  } 

*  max[3,  c^-'cr  } 
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t  ... 


If  job  i  is  early  (i.e.,  >  0)  then  it  incurs  a 

penalty  g(e^).  If  it  is  tardy  (i.e.,  t.>0),  then  it  incurs  a 
penalty  h(t^).  The  objective  is  to  fiAd  a  schedule  S  that 
minimizes  the  maximum  penalty.  This  problem  was  first  stu¬ 
died  by  Sydney  [23].  He  ootained  an  0(n)  algorithm  for  the 
case  wnen: 

(1)  aj_£aj  implies  b^<bj 

and 


(2)  G(  )  and  h(  )  are  monotone  nondecreasing  continuous 
functions  such  that  g(0)  =  h(0)  -  0. 

Our  notations  and2definitions  are  taken  from  Sidney's 
paper.  Sidney's  0(n  j  algorithm  was  subsequently  improved 
to  O(nlogn)  by  Lakshminarayan  et  al.  The  parallel  algorithm 
wa  snail  develop  here  is  based  on  the  algorithm  of  Lakasn- 
minarayan  et.  al  [16]. 

The  algorithm  of  [16]  first  finds  an  admissable 
schedule  S  using  procedure  ADM IS  (Figure  6.1).  This  pro¬ 
cedure  assumes  that  the  jobs  are  ordered  by  target  start 
times  (i.e.,  ai— ai  +  i^  and  within  start  times  by  target  due 
times  (i.e.,a^=a^+1  implies  o.|_£b^+i ) •  The  maximum  lateness, 

in  S  is  next  computed.  If  /\=0,  then  S*is  clearly  optimal 
(as  max{ e • }=max{ t . } =0 ) .  If  /\  >  0,  then  E  is  computed  using 
one  of  tAeir  Jemmas.  Finally,  all  the  start  times  in  S  are 
decreased  by  E  .  The  new  schedule  is  optimal. 


Procedure  ADMIS  (a,p,s,n) 

//jobs  are  ordered  by  target  start  and  due  times)// 


declare  n,  a.  p.  s, 
-  - -  isn,  l:n,  l:n 


fir  i 


s . 

end  f&r 
end  ADMIS 


2  to  n  do 


Figure  6 . 1 


£\  can  be  computed  in  O(logn)  time  using  n  PEs  (see 
[6]).  As  described*in  [16],  E  may  be  computed  in  0(1)  time 
using  1  PE.  Once  E  has  been  obtained,  n  copies  of  it  can 
oe  made  in  O(logn)  time  using  n  PEs.  Finally,  the  s.s  can 
be  updated  in  0(1)  time  using  n  PEs.  Also,  the  initial  ords 
ering  of  the  jobs  may  be  carried  out  in  O(logn)  time  with  n^ 
PEs.  All  that  remains,  is  the  computation  of  the  admissable 
schedule.  From  Figure  6.1,  we  see  that 


(6.1) 


i 


mix(ai#Si_i  +  p^}, 


2  <  i<n 
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Expanding  the  recurrence  (6.1),  we  obtain 
i-1 

(6.2)  s.  =  max  {a.+*  £  p.  }  ,l<i<n 

1  l±j<i  3  k=j 

3 

It  should  be  easy  to  see  tnat  using  (6.2)  and  0(n  ) 
PEs,  one  can  compute  all  the  s.s  in  O(logn)  time.  We  shall 
devote  the  remainder  of  the  section  to  the  development  of  an 
O(logn)  algorithm  that  utilizes  only  n/2  PEs.  As  we  snail 
see  later,  fn/logn”!  are  all  that  is  needed. 

For  convenience,  we  shall  assume  that  the  jobs  are 
indexed  0,l,...,n-l  rather  than  l,2,...,n.  Before  describ¬ 
ing  the  algorithm,  wg  develop  soma  terminology.  Let  S(0:n- 
1)  oa  an  array.  A  2  -block  of  S  consists  of  all  elements  of 
5  whose  indices  differ  only  in  the  least  significant  k  bits. 
The  2  -blocks  of  A(0:10)  are  [0,1],  [2,3],  [4,5],  [6,7], 
[8,9],  and  [10];  the  2  ^blocks  are  [0,1, 2, 3],  [4,5,6, 7],  and 
[3,9,10];  Two  2  -blocks  are  sibling  blocks  iff  their 

union  is  a  2*  -block.  Thus,  [0,1]  and  l2,T]  axe  sibling 
blocks?  so  also  are  [0,1, 2, 3]  and  [4, 5, 6, 7].  However,  [2,3] 
and  [4,5]  are  not  sibling  blocks. 

Let  A(0:n-1)  and  P(0:n-1)  be  the  target  start  times  and 
the  processing  times.  L^t  [i,  i+1 ,  i£2,...,r]  be  the  index 
set  for  any  £  -block  (a  2  -block  has  2K  indices  unless  it  is 
the  last  2  -block) .  With  respect  to  this  2  -blocx,  we 
define 

j-i 

S(j)  -  Z  P(q) ,  j  is  an  index  in  this  block 
q=i 
r 

(6.3)  T(j)  =  E  P(q),  r  is  the  highest  index  in  the  block 

q=i 

j-i 

Q(j)  =  max  {A(q)+  i  P(t)}  ,j  is  a  block  index 
i£q£j  t=q 

U(j)  =  Q(r)  +  P(r),  j  is  a  block  index 
0 

For  a  2  -block  [i],  we  have: 

(6.4)  S ( i )  =  3;  T(i)  =  P(i);  Q(i)=A(i);  U(i)=A(i) 


Let  [i,i+l,...,u]  and  Y  =  [u+l,...,v]  be  two 

sibling  2  -blocks.  Their  union  Z  =  Ci, i+1 , . . . , vj  is  a 
2k  -block.  Let  S,  T,  Q,  and  U  be  the  values  defined  in 
(6.3)  with  respect  to  tne  2  -blocks.  Let  S',  T',.Q' ,  and  U' 
oe  the  values  defined  with  respect  to  the  2K  -blocx  Z. 
From  (6.3),  we  see  that: 

I  if  j  4  X 

|  S(j)+T(i)  if  j  «  Y 


(6.5a)  S ' ( j ) 
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(6.5b) 

1 

T' ( j)-| 

T( j)+T(u+l) 
T( j)+T(i) 

if  j  ■*  X 
if  j  *  Y 

(6.5c) 

1 

Q (  j  )•  if  j 

max { Q( j ) ,  U( i) +S ( j ) }  if  j 

*  X 

+  Y 

(6 . 5d) 

U'  ( j)  = 

Q' (v)+P(v) 

One  al 30 

notes  that 

with  respect 

to 

r  log  ^ 

2 

nl 

-block  [0, 1,  •  • 

• ,n-l], 

Q(  j)  *  max  [a  +  >  p( t) } 
0<_q<_  j  ^  t=q 

»  Sj  of  (6.2) 


Our  strategy  is  to  compute  the  admissable  schedule 
obtained  Dy  procedure  ADMIS  by  using  (5.5  a-d) .  We  start 
with  the  S,  T,  Q,  and  U  values  for  2  -blocks  as  given  by 
(6.4).  Next  using  (6.5  a-d),  the-S,  T,  Q  and  U  values  for 
2  -blocks  are  obtained;  then  for  2  -blocks,  then  for  2  - 
blocks;  etc.  Until  we  have  obtained  the  Q  values  for  tne 

r  log-n"! 

entire  2  -block. 

Example  6^:  Figure  6.2  gives  a  set  of  Id  jobs  (indexed  0 
tnrough  9).  The  first  row  of  Figure  6.3  gives  the  S,  T,  Q, 
and  U  values  for  the  2  -blocxs;  etc.  The  numbers  with 
arrows  give  PS  assignments.  From  the  bottommost  row,  we 
obcain  s=(0, 3 , 5 , 3, 12 , 13 , 16, 20, 21 , 24)  as  the  admissaole 
schedule.  [] 


0 

1 

2 

— 

3 

4 

5 

6 

7 

8 

1 

9  1 

3 

2 

2 

4 

1 

3 

4 

1 

3 

4 

_2_j 

ij 

4 

[2 

9 

9 

15 

15 

16 

HzJ 

Figure  6.2  An  Example  of  data  sec 


Let  us  now  proceed  to  the  formal  algorithm.  In  the 
actual  algorithm,  processors  are  assigned  to  compute  the  new 
values  of  S,  T,  Q,  and  U.  Assume  that  the  PEs  are  indexed 
0 , 1 ,  •  •  • ,  Ln/2 J  ”1 •  With  respect  to  our  example  of  Figure 
6.3,  when  k*0,  PE(0)  will  compute  the  new  values  of  3(1), 
TO) ,  TCI),  Q(  1 ) ,  0(3),  and  U(l);  PE(1)  will  compute  3(3), 
T(2),  T(3),  Q(3),  U(2),  and  U(3);  etc.  When  k*l,  PEs  0  and  1 
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Figure  6 . 3_  Computing  tne  admissable  scneduie 

2 

are  ootn  assigned  to  the  new  2  -olock  being  con 

structed.  ?£s  2  and  3  are  assigned  to  the  block  [4, 5,6,7] 
?E3  4  and  5  are  idle. 
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Let  . . .i^ * i-» i^ » ig  be  the  binary  representation  of  i. 
The  PE  assignment  rule  is  obtained  by  defining  the  function 
f(i# j)*- • -ij+1»ij0  ig .  When  2  -blocks  are  being 

combined,  PE(i)  computes  S( f ( i , k) +2*) ,  T(f(i,k)J, 
T( f ( i , k)+2K) ,  Q(f(i,k)+  2*),  U( f ( i,k) ) ,  and  U(f(i,k)+  2R) 

(provided  of  course  that  all  these  indices  are  less  than  n) . 
The  formal  algorithm  is  given  in  Figure  6.4.  This  algorithm 
mirrors  equations  (6.5  a-d) .  Some  minor  modifications  have 
however  been  made.  Since  T(  )  is  the  same  for  all  indices  in 
a  2  -block,  S(j)+P(i)  of  (6.5a)  has  been  replaced  by 
S( j )+T( j — 2 _ ) .  Similarly,  T(j)+T(u+1)  has  been  replaced  by 
T( j )+T( j+2  );  and  U(i)+5(j-l)  by  U( j-2*)+S ( j ) .  Note  that  as 
a  result  of  tnis  change,  new  T  and  U  j/alues  for  tne  rignt 
most  block  may  be  incorrect  (as  j+2  may  exceed  n-1).  This 
does  not  affect  the  outcome  of  the  algorithm  as  the  T  and  U 
values  of  rightmost  blocks  are  never  used.  One  may  verify 
that  max{U( j+2X) , U( j )+T ( j+2  j  =Q ' ( v) +P( v)  (Eq.  6.5d).  When 
k=|_lognJ  -1,  only  3  need  be  computed. 

procedure  PADMI5 ( A, P, s , n ) 

//obtain  the  admissable  schedule(s.  s?  ...,3  )// 
declare  n,  A(  0  :  n-1 )  ,  P(  0  :  n-1 )  ,  S  ( 3  :n=i  )  ft  ( 0  sn-S ) 
delcare  Q(0:n-l),U(0:n-l),j,i 
for  each  PE(i)  do  in  parallel 
j  «—  f  (  i,  0 ) 

//initialize  2  -blocks// 

S(j)  <-0?  T  (  j  )  < —  P  (  j  )  ;  Q  (  j  )  < —  A(j);U(j)  <—  A(  j  )  +P(  j  ) 

S ( j  + 1 )  < —  0 ;  T(j  +  1)  < — P( j  +  1) 

Q  (  j  +  1 )_  < —  A(j  +  l);U(j  +  l)  <-A(  j  +  l)+P(  j  +  1) 
for  k +— 0  to  |_  log-nl  d° 

// combine  2  -blocks// 
j  <—  f(i,k)  //PE  assignment// 

—  ^  +  2  <ni-then  k  k 

3(  j+2^T^Tmax(a(  j+2n  ,  U(  j  )+S(  j+2^)  } 

U(j+2*)  <—  max/u (  j+2K )  ,  U(  j  )+T(  j+2K)  } 

U(j)  k<-U(j  +  2K) 

S(  j+2*)  <-S(  j+2*)+T(n) 

T( j+2*)  <—  T ( h ) +T ( j  +  2*) 

T(  j)  < —  T (  j  +  2  j 
endif 
end  for 
end  for 

s^  <— Q(  i ) ,  0£i<n 

end  P ADM IS 

Figure  j6*4:  Parallel  admissable  schedule  algorithm. 


The  complexity  of  PADMIS  is  readily  seen  to  be  O(logn). 
It  uses  n/2  PEs .  By  dividing  the  jobs  into  [~ n/logn~| 
groups,  each  of  size  at  most  logn,  it  is  possible  to  compute 
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uhe  s^3  in  O(logn)  time.  This  requires  combining  tne  sequen¬ 
tial  and  parallel  algorithms  together.  We  omit  tne  details. 
However,  this  grouping  technique  has  been  used  in  otner 
problems.  The  details  can  be  found  in  [6].  Witn  this 
grouping  tecnnique,  the  parallel  admissable  scnedule  algo¬ 
rithm  will  nave  an  EPU  of  0(1). 


The 
minimize 
the  sort 
n  PEs. 


overall  complexity  of  the  parallel  algorithm  to 
earliness  and  tardiness  penalties  is  determined  by 
(to  order  jobs).  This  takes  O(logn)  time  and  uses 
The  EPU  is  0(l/n) . 


7.  Cnannel  Assignment 

The  channel  assignment  problem  occurs  naturally  as  a  wire 
routing  problem.  Components  of  an  electrical  circuit  are 
laied  out  in  a  straight  line  a3  in  Figure  7.1.  Certain 
pairs  of  components  are  to  be  connected  using  only  two  vert¬ 
ical  runs  and  one  horizontal  run  of  wire  (as  in  Figure  7.1). 
The  horizontal  and  vertical  runs  are  pnysically  located  in 
different  layers.  Each  horizontal  run  of  wire  lies  in  a 
’cnannel'.  No  channel  can  simultaneously  carry  more  than 
one  wire.  We  are  required  to  assign  the  norizontal  wire 
runs  to  channels,  using  the  least  number  of  cnannals.  The 
assignment  of  Figure  7.1  uses  3  channels. 


channel  3 
channel  2 
channel  1 

components 


1 


6 


/ 


Figure  j^.l^: 


Wiring  witn  3  channels. 


In  the  mathematical  formulation  of  this  problem,  we  are 
given  n  pairs  of  points  (a,,b^),  l_<i<^n.  Eacn  pair  ( ,  b .  ) 
is  to  be  joined  by  a  continuous  horizontal  run  of  wire. 
These  wires  are  to  be  assigned  to  channels,  in  such  a  way 
that  the  number  of  channels  used  is  minimum.  In  the  example 
of  Figure  7.1,  n*4;  the  pairs  of  points  are  (1,4),  (2,5), 
(3,7),  and  (6,3);  the  channel  assignment  is:  (1,4)  and  (6,3) 
in  channel  1,  (2,5)  in  channel  2,  and  (3,7)  in  channel  3. 

The  joo  sequencing  problem  with  release  times  and  due 
times  C3j  is  similar  to  the  channel  assignment  problem. 
Suppose  we  are  given  a  3et  J  of  n  jobs.  Associated  with 
eacn  joo  is  a  release  time  r  ,  a  due  time  d^,  and  a  process¬ 
ing  time  A  feasible  schedule  i3  one  in  whicn  no  job  is 
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processed  before  its  release  timer  all  jobs  complete  by 
their  respective  due  times;  and  jobs  are  processed  without 
interruption  from  start  to  finish.  We  are  required  to  find  a 
feasible  schedule  that  uses  the  fewest  number  of  machines. 
One  readily  sees  that  when  r.+p.-d.,  l£i<n,  this  problem  is 
identical  to  the  channel  assignment  problem.  When  this  res¬ 
triction  on  r.,  p.,  and  d.  is  removed,  the  problem  is  tfP- 

nard. 

The  fastest  sequential  algorithm  known  for  the  channel 
assignment  proolem  is  due  to  Gupta,  Lee,  and  Leung  [9J. 
This  algorithm  runs  in  O(nlogn)  time  and  consists  of  the 
following  steps: 

step  1_;  Sort  the  multiset  {a^  I  1  <_i<n}  (J{  b .  I  l£i£n} 

of  the  2n  end  points  into  nondec^easing  order. 
step  2i  m  <—  <3?  stack  <—  empty 
step  _3:  process  the  2n  points  one  by  one 

if  the  point  being  processed  is  an  a^ 
then  if  stack  empty  then  m  m+1 

assign  this  wire  run  to  channel  m 
else  unstack  a  channel  from 

the  stack  and  assign  the  wire  to  this 
channel 

end  if 

else  put  the  channel  used  by  this  wire  onto  the 
stack 

endif 


In  the  above  three  step  algorithm  of  [9],  the  final 
value  of  m  is  the  fewest  number  of  channels  needed.  The 
assignment  is  constructed  while  this  number  is  being  deter¬ 
mined.  It  is  possible  to  determine  this  number  without 
actually  obtaining  a  cnannel  assignment.  Let  c^, 

C2n  be  the  sorted  sequence  of  2n  end  points.  Set  z^=l  if  c. 
is  an  and  z^=-l  if  is  a  It  is  easy  to  see  that 

r.=  >  z.  gives  the  number  of  wires  that  either  start  at  c. 
J  i— 1  * 

or  cross  the  point  c.  .  Further,  max  t r . }  is  the  number  of 

l£j<2n  3  . 

channels  needed  to  route  the  n  wire  segments. 

r.,  l£i<jn  can  be  computed  using  the  partial  sums  algo¬ 
rithm  ^of  [6].  This  algorithm  takes  O(logn)  time  and  uses 
(~n/ logn"!  PEs .  The  largest  r.  can  be  found  in  O(logn) 
time  using  Tn/logn-]  PEs.  The  initial  ordering  of  tne  as 
and  bs  can  oe  done  in  O(logn)  time  using  n  PEs.  If  this 
sorting  algorithm  is  used,  the  resulting  parallel  algorithm 
to  determine  tne  fewest  number  of  channels  haq  a  time-  com¬ 
plexity  of  O(logn)  and  an  EPU  of  0(l/n).  If  the  0{log  n),  n 
PE  sorting  algorithm  of  [21]  is  used  instead,  the  time  com¬ 
plexity  is  O(log  ii)  and  the  EPU  is  O(l/logn). 
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Example  7_.l_:  Figure  7.2  gives  a  set  of  a  wires,  Figure  7.3 
shows  tae  results  of  tne  different  steps  of  tne  parallel 
algorithm  to  determine  the  fewest  number  of  channels  needed. 
This  nurnoer  is  4.  [] 
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Figure  7_'~. 
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Figure  7^,3 


Fhe  actual  channel  assignment  can  be  obtained  from 


tne 


r^s  (recall  tnat  >  z^)  ,  l_<j<2n.  Assume  tnat  c ^ 

corresponds  to  .  Let^q^be  the  largest  index  such  that 
q<  j ,  r  »r.-l#  and  c  corresponds  to  an  a  (say  a  ).  If  no 
sucn  q  exists,  set  q  to  3.  An  examination  of  the  ^algoritnm 
of  Gupta  at  al .  reveals  that  if  q=0,  then  the  channel  used 
oy  (a,  #b  )  has  not  been  used  earlier.  If  q  #0,  than  it  was 
most  recently  used  in  the  interval  (a  , b  ).  To  see  the 
truth  of  tnis,  note  that  at  point  oQt  the  ^channel  assigned 
to  (a  ,b  )  Puc  into  the  3tack.te  This  channel  remains  in 
tne  3tac*fc until  we  reach  the  nearest  point  at  which  tne 
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number  of  wires  that  start  or  cross  is  one  more  than  the 
number  at  bQ  (if  a.=a.  and  i<j,then  we  say  that  a^  is  before 
a.).  For  " every  3  such  that  c.  is  an  a  point,  let  L(k)  =  p 
as  defined  above.  ^ 

Ii(j)  partitions  the  set  of  n  wires  into  sets.  Figure 
7.4  gives  the  paritioning  for  the  example  of  Figure  5.3. 
Each  wire  is  represented  by  a  circle.  The  circle  with  index 
i  inside  it  represents  the  wire  (a^,  •  Mj)  may  be 
interpreted  as  a  left  link.  Figure  7.4  shows  the  paritions 
as  linked  lists  with  L(  )  being  shown  as  a  leftward  arrow. 
We  leave  it  to  the  reader  to  see^how  the  L(  )  values  may  be 
obtained  in  O(logn)  time  using  n  /logn  PEs . 


<2> 


d> - © 

d> - 0 

0 - © 


Figure  7_*i:  Paritions  for  Example  7.3 


The  channel  assignment  Q(k)  for  a  wire  k  with  L(k)  =  0  is 

obtained  from  the  r  value  corresponding  to 

If  L(k)  ^  d,  we  may  initially  set  Q(k)=0.  The  actual 
channel  assignments  for  wires  with  L(k)  ^  0,  may  be  obtained 
oy  simultaneously  collapsing  the  linked  lists  and  transmit¬ 
ting  the  channel  assignment  within  the  lists  as  below: 
for  j  «—  1  to  r  lognH  do 

for  eacn  i  for  which  Q(i)=0  do  in  parallel 
if  L(L(i))-3  then  Q(i)  Q(L (  i )  ) 

i*Ti )  < —  L( L( i )  ) 

end  for 
end  for 


The  parallel  complexity  of  the  above  scheme  is  O(logn). 
Therefore,  the  overall  complexity  of  our  parallel  channel 
assignment  algorithm  is  O(logn)  (i.e.,  using  the  O(logn)  n 
PE  sorting  algoricnm);  its  EPU  is  0(l/n). 


25 


3 .  Conclusions 

T he  extent:  to  wnicn  parallel  computers  will  find  application 
will  depend  largely  on  our  anility  to  find  efficient  algo¬ 
rithms  for  them.  In  tms  paper  wa  have  examined  several 
scheduling  problems.  Tne  single  processor  algorithm  for 
eacn  of  tnese  appeared  to  oe  highly  sequential  in  nature.  A 
closer  Iook  revealed  a  parallel  structure  tnat  led  to  effi¬ 
cient  parallel  algorithms.  Several  otner  scneduling  prob¬ 
lems  can  be  solved  efficiently  using  the  techniques  of  tms 
paper  and  of  [22  j. 

Some  examples  are: 

(a)  2  machine  flow  shop  scheduling  to  minimize  finish 
time . 

(b)  2  machine  open  shop  scheduling  to  minimize  finish 
time 

(c)  2  machine  flow  shop  scneduling,  with  no  wait  in 
process,  to  minimize  finish  time 


The  parallel  algorithms  for  the  aoove  problems  involve 
a  rather  straightforward  application  of  parallel  sorting  and 
partial  sums.  For  example,  consider  problem  (a).  Here,  we 
simply  divide  the  job  set  into  two  classes:  (i)  jobs  which 
need  less  time  on  machine  1  than  on  2  (ii)  remaining  joos. 
Jobs  in  (i)  are  sorted  into  non decreasing  order  of  their 
machine  1  processing  times.  Jobs  in  (ii)  are  sorted  into 
nondecreasing  order  of  their  machine  2  processing  time.  Tne 
optimal  processing  permutation  consists  of  jobs  in  (i)  in 
sorted  order  followed  by  those  in  (ii)  in  sorted  order.  One 
readily  sees  tnat  this  permutation  satisfies  Jacxson’s  rule 
[15]. 
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